-
Notifications
You must be signed in to change notification settings - Fork 5.7k
add inplace logic into new_executor #35618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add inplace logic into new_executor #35618
Conversation
Thanks for your contribution! |
} | ||
} | ||
|
||
} // namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetTensorFromVar 和 GetMutableTensorFromVar 在 paddle/fluid/framework/details/share_tensor_buffer_functor.cc
里都有定义,这两个函数可以抽离到框架层面的utils里复用?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done,thx!
@@ -243,6 +304,16 @@ void InterpreterCore::RunInstruction(const Instruction& instr_node) { | |||
instr_node.kernel_func_.operator_base_) | |||
->InferShape(instr_node.infershape_ctx_.get()); | |||
|
|||
if (FLAGS_new_executor_use_inplace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
底层share了buffer之后,Variable的ref count也要增加; 因为out和input share了数据,如果out被其他op使用,这个inpute的数据就不能够提前释放
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
底层share了buffer之后,Variable的ref count也要增加; 因为out和input share了数据,如果out被其他op使用,这个inpute的数据就不能够提前释放
这个不会的。share在Instruction前执行,此时In、Out分别持有share_ptr的holder。Instruction执行后,In交给GC后,GC只会减去share_ptr的一个RefCount,Out还能够继续正常持有holder。
* add inplace logic into new_executor, test=develop * check shape and add inplace FLAGS, test=develop * refine, test=develop * refine, test=develop
PR types
New features
PR changes
Others
Describe
添加inplace策略,inplace策略后,因为执行mutable_data的次数变少。显存有所下降,速度有所提升。
23.4s
,20.4
。1150M
,1086M
。